
United States Patent and Trademark Office 



UNITED STATES DEPARTMENT OF COMMERCE 
United States Patent and Trademark Office 
Address: COMMISSIONER FOR PATENTS 
P.O. Box 1450 

Alexandria, Virginia 22313-1450 
www.uspto.gov 



APPLICATION NO. 



FILING DATE 



FIRST NAMED INVENTOR 



ATTORNEY DOCKET NO. 



CONFIRMATION NO. 



10/733,178 



12/10/2003 



08/07/2006 



7590 

Marc A. Hubbard 

Munsch Hardt Kopf & Harr, P. C 

4000 Fountain Place 

1445 Ross Avenue 

Dallas, TX 75202-2790 



Eric P. Jiang 



4272.68-13 



8293 



EXAMINER 



LO, SUZANNE 



ART UNIT 



PAPER NUMBER 



2128 

DATE MAILED: 08/07/2006 



Please find below and/or attached an Office communication concerning this application or proceeding. 



PTO-90C (Rev. 10/03) 



Office Action Summary 


Application No. 

10/733,178 


Applicant(s) 

JIANG ETAL 


Examiner 

Suzanne Lo 


Art Unit 

2128 





- The MAILING DATE of this communication appears on the cover sheet with the correspondence address 
Period for Reply 



A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) OR THIRTY (30) DAYS, 
WHICHEVER IS LONGER, FROM THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1.136(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 

- If NO period for reply is specified above, the maximum statutory period will apply and will expire SIX (6) MONTHS from the mailing date of this communication. 

- Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 133). 
Any reply received by the Office later than three months after the mailing date of this communication, even if timely filed, may reduce any 
earned patent term adjustment. See 37 CFR 1.704(b). 

Status 

1 )S Responsive to communication(s) filed on 10 December 2003 . 
2a)D This action is FINAL. 2b)M This action is non-final. 

3) D Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayle, 1935 CD. 11, 453 O.G. 213. 

Disposition of Claims 

4) I3 Claim(s) 1-56 is/are pending in the application. 

4a) Of the above claim(s) is/are withdrawn from consideration. 

5) D Claim(s) is/are allowed. 

6) C3 Claim(s) 7-56 is/are rejected. 

7) D Claim(s) is/are objected to. 

8) D Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) D The specification is objected to by the Examiner. 

10)13 The drawing(s) filed on 10 December 2003 is/are: a)S accepted or b)D objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1 .85(a). 

Replacement drawing sheet(s) including the correction is required if the drawing(s) is objected to. See 37 CFR 1.121(d). 
1 1 )S The oath or declaration is objected to by the Examiner. Note the attached Office Action or form PTO-152. 
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1 .□ Certified copies of the priority documents have been received. 

2. D Certified copies of the priority documents have been received in Application No. . 

3. D Copies of the certified copies of the priority documents have been received in this National Stage 

application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 
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DETAILED ACTION 

1 . Claims 1-56 have been presented for examination. 

PRIORITY 

2. Acknowledgment is made of applicant's claim for priority to provisional application 60/432,63 1 
filed on 12/10/2002. 

Qntlt/Declaration 

4. The oath or declaration is defective. A new oath or declaration in compliance with 37 

CFR 1.67(a) identifying this application by application number and filing date is required. See MPEP 
§§ 602.01 and 602.02. The oath or declaration is defective because: signatures are missing for inventors 
Andrew John Caffrey, Karen Christiana Joiner-Congleton, and Yong M. Kim. 

Claim Objections 

5. Claims 1 1 and 39 objected to because of the following informalities: 

Claims 1 1 and 39 contain the word "amean". Appropriate correction is required. 

Claim Rejections - 35 USC S 101 
35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition 
of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

6. Claims 1-56 rejected under 35 U.S.C. 101 because the claimed invention is directed to non- 
statutory subject matter. 

Specifically, in claims 1-28 the broadest reasonable interpretation of the method would result in 
merely abstract mathematical steps. Although the preamble discloses a computer-based system, the claim 
limitations are not necessarily directed to software. Assuming for the sake of argument that the claims are 
directed to software, they would still not produce a tangible output or result and do not allow their 
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usefulness to be realized. In addition, claims 29-56 do not produce a tangible output or result and do not 
allow their usefulness to be realized. 



The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

7. Claims 1-28 are rejected under 35 U.S.C. 1 12, second paragraph, as being indefinite for failing to 
particularly point out and distinctly claim the subject matter which applicant regards as the invention. It 
is unclear what statutory category claims 1-28 as the preamble of the claims state, "in a computer-based 
system, a method of building a statistical model" and could either refer to a computer-based method of 
building a statistical model or a computer-based system with means for executing a method of building a 
statistical model 



The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness 

rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as 
set forth in section 102 of this title, if the differences between the subject matter sought to be 
patented and the prior art are such that the subject matter as a whole would have been obvious at 
the time the invention was made to a person having ordinary skill in the art to which said subject 
matter pertains. Patentability shall not be negatived by the manner in which the invention was 



The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), 
that are applied for establishing a background for determining obviousness under 35 U.S.C. 103(a) are 
summarized as follows: 



Claim Rejections - 35 USC S 112 



Claim Rejections - 35 USC $ 103 



made. 



1. 

2, 
3. 



Determining the scope and contents of the prior art. 

Ascertaining the differences between the prior art and the claims at issue. 

Resolving the level of ordinary skill in the pertinent art. 
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4. Considering objective evidence present in the application indicating obviousness or 
nonobviousness. 

This application currently names joint inventors. In considering patentability of the claims under 
35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly 
owned at the time any inventions covered therein were made absent any evidence to the contrary. 
Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of 
each claim that was not commonly owned at the time a later invention was made in order for the examiner 
to consider the applicability of 35 U.S.C. 103(c) and potential 35 US.C. 102(e), (f) or (g) prior art under 
35 U.S.C. 103(a). 

8. Claims 1-56 are rejected under 35 U.S.C. 103(a) as being unpatentable over Applicants' own 
admission that a method and system automatically performs many or all of the steps of statistical analysis 
described in the background of the application. 

Claims 1-56 appear to be directed to the automation (page 3 of Specification, [0009]) of a 
manual activity utilizing the steps disclosed by the Applicant in the background as well as certain sections 
of the specification of the instant application. In re Venner, 262 F.2d 91, 95, 120 USPQ 193, 194 (CCPA 
1958), the court held that broadly providing an automatic or mechanical means to replace a manual 
activity which accomplished the same result is not sufficient to distinguish over the prior art. 

As per claim 1, Applicants' own admission is directed to in a computer-based system, a method 
of building a statistical model, comprising: automatically identifying and flagging categorical variables in 
a data set containing both categorical and continuous variables; automatically identifying categorical 
variables that are correlated with one or more continuous variables and eliminating categorical variable 
that are correlated with at least one continuous variable from a training data matrix used to build a 
statistical model, wherein the training data matrix comprises a subset of the original data set; and building 
the statistical model based on the training data matrix (page 2 of Specification, [0006]) and therefore 
known in the art at the time of the invention. 

As per claim 2, Applicants' own admission is directed to the method of claim 1 wherein said step 
of automatically identifying and flagging categorical variables comprises: determining if a variable 
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contains integer observation values; if the variable contains integer values, determining the number of 
unique integer values contained in the variable; determining if the number of unique values exceeds a 
predetermined threshold value; and if the number of unique values does not exceed the threshold value, 
flagging the variable as a categorical variable (page 2 of Specification, Sections "Data Exploration" 
and "Categorical Variable Pre-preprocessing", [0006]) therefore known in the art at the time of the 
invention. 

As per claim 3, Applicants' own admission is directed to the method of claim 2 further 
comprising: if the number of unique values exceeds the threshold value, determining if the variable has 
predictive strength greater than a predetermined value of Pearson's r; if the variable has predictive 
strength greater than the predetermined value of Pearson's r, flagging the variable as a continuous 
variable; if the variable has predictive strength less than the predetermined value of Pearson's r, reducing 
the number of unique values by eliminating those unique values containing less than a predetermined 
number of entries so as to create a reduced variable set with a reduced number of unique values; 
determining if the reduced number of unique values exceeds the threshold value; and if the reduced 
number of unique values does not exceed the threshold value, flagging the variable as a categorical 
variable, else flagging the variable as a continuous variable (page 10, [0040] and page 2 of 
Specification, Sections "Data Exploration" and "Categorical Variable Pre-preprocessing", [0006]) 
therefore known in the art at the time of the invention. 

As per claim 4, Applicants' own admission is directed to the method of claim 1 wherein said step 
of automatically identifying categorical variables that are highly correlated with one or more continuous 
variables comprises: binning at least one continuous variable so as to convert the continuous variable into 
a psuedo-categorical variable; and calculating a Cramer's V value between at least one categorical 
variable and the psuedo-categorical variable to obtain an estimated measure of co-linearity between the 
categorical variable and the continuous variable (page 10, [0040] and page 2 of Specification, Sections 
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"Data Exploration" and "Categorical Variable Pre-preprocessing", [0006]) therefore known in the 
art at the time of the invention. 

As per claim 5, Applicants' own admission is directed to the method of claim 1 further 
comprising: calculating a correlation value for each variable in the training data matrix with respect to a 
target variable; sorting the variables based on their correlation with the target variable; and retaining a 
predetermined number of variables having the highest correlation values and eliminating any remaining 
variables from the training data matrix (page 2 of Specification, Section "Variable Reduction" [0006] 
and page 10, [0042]) therefore known in the art at the time of the invention. 

As per claim 6, Applicants' own admission is directed to the method of claim 1 further 
comprising: expanding each categorical variable contained in the training data matrix into a plurality of 
dummy variables; measuring a predictive strength for each dummy variable and continuous variable in 
the training data matrix toward a target variable; determining if any pair of variables in the set of dummy 
and continuous variables exhibits a pair- wise correlation greater than a predetermined threshold; and if a 
pair of variables exhibits a pair-wise correlation greater than the threshold, eliminating one of the 
variables in the pair from the training data matrix, wherein the eliminated variable exhibits less predictive 
strength toward the target variable than the non-eliminated variable in the pair (page 2 of Specification, 
Section "Variable Reduction" [0006] and page 10, [0042]) therefore known in the art at the time of the 
invention. 

As per claim 7, Applicants' own admission is directed to the method of claim 1 further 
comprising: creating a plurality of principle components from the variables contained in the training data 
matrix, wherein each principle component comprises a linear combination of variables; sorting the 
plurality of principle components by how much variance of the training data matrix each component 
captures; selecting a subset of the plurality of principle components that captures a variance greater than a 
predetermined percentage of total variance; and using the selected principle components to build the 
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statistical model (page 2 of Specification, [0006], Sections "Create Model" and "Model Selection" 
and page 18, [0069|-[0072], and [0076]) therefore known in the art at the time of the invention. 

As per claim 8, Applicants' own admission is directed to the method of claim 7 wherein said step 
of using the selected principle components to build the statistical model comprises: performing a singular 
value decomposition (SVD) to generate a loading matrix; and mapping coefficients calculated for the 
principle components back to corresponding variables of the training data matrix using the loading matrix 
(page 2 of Specification, [0006], Sections "Create Model" and "Model Selection" and page 18, 
[0069]-[0072], and [0076]) therefore known in the art at the time of the invention. 

As per claim 9, Applicants' own admission is directed to the method of claim 1 further 
comprising: performing a singular value decomposition (SVD) analysis using the variables contained in 
the training data matrix if the number of records in the training data matrix is less than a predetermined 
value; and otherwise, performing a conjugate gradient descent (CGD) analysis on a residual sum of 
squares based on the variables contained in the training data matrix if the number of records in the 
training data matrix is greater than or equal to the predetermined value (page 2 of Specification, [0006], 
Sections "Create Model" and "Model Selection" and page 18, [0069]-[0072], and [0076]) therefore 
known in the art at the time of the invention. 

As per claim 10, Applicants' own admission is directed to the method of claim 1 further 
comprising: detecting outlier values in the data set; and for each detected outlier value, presenting a user 
with the following three options for handling the outlier value: (1) substitute the outlier value with a 
maximum or minimum non-outlier value in the data set; (2) keep the outlier value in the data set; (3) 
delete the record corresponding to the outlier value (page 2 of Specification, [0006], Section "Data 
Cleansing", and page 11, [0044]) therefore known in the art at the time of the invention. 

As per claim 11, Applicants' own admission is directed to the method of claim 1 further 
comprising: detecting missing values in the data set; and for each missing value of a variable, inserting 
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amean value of non-missing values of the variable in place of the missing value in the data set (page 2 of 
Specification, [0006], Section "Data Cleansing") therefore known in the art at the time of the invention. 

As per claim 12, Applicants' own admission is directed to the method of claim 1 further 
comprising: automatically detecting continuous variables having an exponential distribution; and log- 
scaling those continuous variables using the following formula:x(i) — min bx(i) = 1 - e mean — min where 
x(i) is a continuous variable being analyzed, min, and mean is the minimum value and the mean value of 
the variable in samples, respectively (page 2 of Specification, Section "Variable Standardization and 
page 13, [0057]-[0053]) therefore known in the art at the time of the invention. 

As per claim 13, Applicants' own admission is directed to the method of claim 12 further 
comprising normalizing all the variables in the training data matrix (page 2 of Specification, Section 
"Variable Standardization) therefore known in the art at the time of the invention. 

As per claim 14, Applicants' own admission is directed to the method of claim 1 further 
comprising randomly splitting the data set into a subset of training variables and a subset of test variables, 
wherein the training variables are used to create the training data matrix for building the model and the 
subset of test variables are subsequently used to test the resulting model (page 2 of Specification, 
Sections "Split Data Set" and "Model Validation") therefore known in the art at the time of the 
invention. 

As per claim 15, Applicants' own admission is directed to the method of claim 14 wherein prior 
to using the subset of test variables to test the model, pre-processing is performed on variables in the test 
set so as to create a test data matrix containing the same variables and same format as the training data 
matrix (page 2 of Specification, [0006], Sections "Data Exploration" and "Categorical Variable 
Preprocessing") therefore known in the art at the time of the invention. 

As per claims 16-28, the claims are directed to methods with the same limitations as claims 1-15 
and therefore rejected over the same art. 
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As per claims 29-56, the claims are directed to a computer-readable medium containing code 
when executed performs method steps with the same limitations as claims 1-28 and therefore rejected 
over the same art. 

9. Claims 1-3, 6, 12-13, 16-17, 21-23, 28-31, 33, 40-41, 44-45, 50-51, and 56 are rejected under 35 
U.S.C. 103(a) as being unpatentable over Wang et al. (U.S. Patent No. 6,470,229 Bl) in view of Brown 
et al. (U.S. Patent No. 6,473,080 Bl). 

As per claim 1, Wang is directed to in a computer-based system (column 12, line 66 - column 
3, line 1), a method of building a statistical model, comprising: automatically identifying and flagging 
categorical variables in a data set containing both categorical and continuous variables (column 3, lines 
60-67); automatically identifying categorical variables that are correlated with one or more continuous 
variables and eliminating categorical variable that are correlated with at least one continuous variable 
from training data used to build a statistical model (column 4, lines 27-35), wherein the training data 
comprises a subset of the original data set (column 4, lines 36-42); and building the statistical model 
based on the training data (column 3, lines 49-53) but fails to specifically disclose the training data in a 
matrix. Brown teaches organizing data in a matrix (column 6, lines 41-53). Wang and Brown are 
analogous art because they are from the same field of endeavor building statistical models. It would have 
been obvious to an ordinary person skilled in the art at the time of the invention to combine the statistical 
model building method with the data organization of Brown in order to creating a data architecture that is 
easily navigable (Brown, column 5, lines 20-21). 

As per claim 2, the combination of Wang and Brown already discloses the method of claim 1 
wherein said step of automatically identifying and flagging categorical variables comprises: determining 
if a variable contains integer observation values; if the variable contains integer values, determining the 
number of unique integer values contained in the variable; determining if the number of unique values 



Application/Control Number: 10/733,178 Page 10 

Art Unit: 2128 

exceeds a predetermined threshold value; and if the number of unique values does not exceed the 
threshold value, flagging the variable as a categorical variable (Wang, column 4, lines 8-35). 

As per claim 3, the combination of Wang and Brown already discloses the method of claim 2 
further comprising: if the number of unique values exceeds the threshold value, determining if the 
variable has predictive strength greater than a predetermined value of Pearson's r; if the variable has 
predictive strength greater than the predetermined value of Pearson's r, flagging the variable as a 
continuous variable; if the variable has predictive strength less than the predetermined value of Pearson's 
r, reducing the number of unique values by eliminating those unique values containing less than a 
predetermined number of entries so as to create a reduced variable set with a reduced number of unique 
values; determining if the reduced number of unique values exceeds the threshold value; and if the 
reduced number of unique values does not exceed the threshold value, flagging the variable as a 
categorical variable, else flagging the variable as a continuous variable (Brown, column 12, lines 15-35). 

As per claim 5, the combination of Wang and Brown already discloses the method of claim 1 
further comprising: expanding each categorical variable contained in the training data matrix into a 
plurality of dummy variables; measuring a predictive strength for each dummy variable and continuous 
variable in the training data matrix toward a target variable; determining if any pair of variables in the set 
of dummy and continuous variables exhibits a pair-wise correlation greater than a predetermined 
threshold; and if a pair of variables exhibits a pair-wise correlation greater than the threshold, eliminating 
one of the variables in the pair from the training data matrix, wherein the eliminated variable exhibits less 
predictive strength toward the target variable than the non-eliminated variable in the pair (Wang, column 
10, lines 32-63). 

As per claim 12, the combination of Wang and Brown already discloses the method of claim 1 
further comprising: automatically detecting continuous variables having an exponential distribution; and 
log-scaling those continuous variables using the following formula:x(i) — min bx(i) = 1 - e mean — min 
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where x(i) is a continuous variable being analyzed, min, and mean is the minimum value and the mean 
value of the variable in samples, respectively (Brown, column 9, lines 22-33 and column 10, lines 36- 
45). 

As per claim 13, the combination of Wang and Brown already discloses the method of claim 12 
further comprising normalizing all the variables in the training data matrix (Brown, column 9, lines 22- 
33). 

As per claims 16-17, 21-22, 23, and 28, the claims are directed to methods with the same 
limitations as claims 1-3, 6, and 12-13 above and therefore rejected under the same art combination. 

As per claims 29-31, 33, 40-41, 44-45, 50-51, and 56, the claims are directed to a computer- 
readable medium containing code when executed performs method steps with the same limitations as 
claims 1-3, 6, and 12-13 above and therefore rejected under the same art combination. 

10. Claims 7-8, 18-19, 25-26, 35-36, 46-47, 49, and 53-54 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Wang et al (U.S. Patent No. 6,470,229 Bl) and Brown et al. (U.S. Patent No. 
6,473,080 Bl) in further view of Vaithyanathan et al (U.S. Patent No. 5,819,258). 

As per claim 7, the combination of Brown and Wang is directed to the method of claim 1 but 
fails to specifically disclose further comprising: creating a plurality of principle components from the 
variables contained in the training data matrix, wherein each principle component comprises a linear 
combination of variables; sorting the plurality of principle components by how much variance of the 
training data matrix each component captures; selecting a subset of the plurality of principle components 
that captures a variance greater than a predetermined percentage of total variance; and using the selected 
principle components to build the statistical model. Vaithyanathan teaches using the method of principle 
component analysis (column 8, line 61-column 9, line 4). Brown, Wang, and Vithyanathan are 
analogous art because they are all from the same field of endeavor, building a statistical model. It would 



Application/Control Number: 10/733,178 Page 12 

Art Unit: 2128 

have been obvious to an ordinary person skilled in the art at the time of the invention to combine the 
statistical model building method of Wang and Brown with the PCA method of Vaithyanathan in order to 
reduce the data set for manageability (Vaithyanathan, column 8, lines 61-67). 

As per claim 8, the combination of Wang, Brown, and Vithyanathan already discloses the 
method of claim 7 wherein said step of using the selected principle components to build the statistical 
model comprises: performing a singular value decomposition (SVD) to generate a loading matrix; and 
mapping coefficients calculated for the principle components back to corresponding variables of the 
training data matrix using the loading matrix (column 9, lines 5-65). 

As per claims 18-19 and 25-26, the claims are directed to methods with the same limitations as 
claims 7-8 above and therefore rejected under the same art combination. 

As per claims 35-36, 46-47, 49, and 53-54, the claims are directed to a computer-readable 
medium containing code when executed performs method steps with the same limitations as claims 7-8 
and 12 above and therefore rejected under the same art combination. 

Conclusion 

11. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
These references include: 

1. U.S. Patent No. 5,78 1,430 issued Tsai on 07/14/98. 

2. U.S. Patent No. 5,452,410 issued Magidson on 09/19/98. 

12. All Claims are rejected. 

Any inquiry concerning this communication or earlier communications from the examiner should 
be directed to Suzanne Lo whose telephone number is (571)272-5876. The examiner can normally be 
reached on M-F, 8-4:30. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, 
Kamini Shah can be reached on (57 1)272-2297. The fax phone number for the organization where this 
application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the Patent Application 
Information Retrieval (PAIR) system. Status information for published applications may be obtained 
from either Private PAIR or Public PAIR. Status information for unpublished applications is available 
through Private PAIR only. For more information about the PAIR system, see http ^/pair- 
direct, uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer 
Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR 
CANADA) or 571-272-1000. 



SL 

07/24/06 



Suzanne Lo 
Patent Examiner 
Art Unit 2128 



KAMINI SHAH 
SUPERVISORY PATENT EXAMINER 



